Inter-sentential Relations in Information Extraction Corpora

نویسندگان

  • Kumutha Swampillai
  • Mark Stevenson
چکیده

In natural language relationships between entities can asserted within a single sentence or over many sentences in a document. Many information extraction systems are constrained to extracting binary relations that are asserted within a single sentence (single-sentence relations) and this limits the proportion of relations they can extract since those expressed across multiple sentences (inter-sentential relations) are not considered. The analysis in this paper focuses on finding the distribution of inter-sentential and single-sentence relations in two corpora used for the evaluation of information extraction systems: the MUC6 corpus and the ACE corpus from 2003. In order to carry out this analysis we had to manually mark up all the management succession relations described in the MUC6 corpus. It was found that inter-sentential relations constitute 28.5% and 9.4% of the total number of relations in MUC6 and ACE03 respectively. This places upper bounds on the recall of information extraction systems that do not consider relations that are asserted across multiple sentences (71.5% and 90.6% respectively).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Relations Within and Across Sentences

Previous work on relation extraction has focussed on identifying relationships between entities that occur in the same sentence (intra-sentential relations) rather than between entities in different sentences (inter-sentential relations) despite previous research having shown that intersentential relations commonly occur in information extraction corpora. This paper describes a SVM-based approa...

متن کامل

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

The Effect of Intra-sentential, Inter-sentential and Tag- sentential Switching on Teaching Grammar

The present study examined the comparative effect of different types of code-switching, i.e., intrasentential,inter-sentential, and tag-sentential switching on EFL learners grammar learning andteaching. To this end, a sample of 60 Iranian female and male students in two different institutionsin Qazvin was selected. They were assigned to four groups. Each group was randomly assigned toone of the...

متن کامل

The ITI TXM Corpora: Tissue Expressions and Protein-Protein Interactions

We report on two large corpora of semantically annotated full-text biomedical research papers created in order to develop information extraction (IE) tools for the TXM project. Both corpora have been annotated with a range of entities (CellLine, Complex, DevelopmentalStage, Disease, DrugCompound, ExperimentalMethod, Fragment, Fusion, GOMOP, Gene, Modification, mRNAcDNA, Mutant, Protein, Tissue)...

متن کامل

Towards a Closer Integration of Termbases, Translation Memories, and Parallel Corpora: -A Translation-Oriented View-

This paper takes a look at how the use of terminological information and bilingual corpora of previously translated texts can improve the performance of translation memories. The focus is on using terminology to support sub-sentential alignment. The author tries to show that the performance of translation memories will not benefit significantly from generalizing the units stored in the memory b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010